Towards an optimal weighting of context words based on distance

نویسندگان

  • Bernard Brosseau-Villeneuve
  • Jian-Yun Nie
  • Noriko Kando
چکیده

Word Sense Disambiguation (WSD) often relies on a context model or vector constructed from the words that co-occur with the target word within the same text windows. In most cases, a fixed-sized window is used, which is determined by trial and error. In addition, words within the same window are weighted uniformly regardless to their distance to the target word. Intuitively, it seems more reasonable to assign a stronger weight to context words closer to the target word. However, it is difficult to manually define the optimal weighting function based on distance. In this paper, we propose a unsupervised method for determining the optimal weights for context words according to their distance. The general idea is that the optimal weights should maximize the similarity of two context models of the target word generated from two random samples. This principle is applied to both English and Japanese. The context models using the resulting weights are used in WSD tasks on Semeval data. Our experimental results showed that substantial improvements in WSD accuracy can be obtained using the automatically defined weighting schema.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Supervised Method of Feature Weighting for Measuring Semantic Relatedness

The clustering of related words is crucial for a variety of Natural Language Processing applications. Many known techniques of word clustering use the context of a word to determine its meaning. Words which frequently appear in similar contexts are assumed to have similar meanings. Word clustering usually applies the weighting of contexts, based on some measure of their importance. One of the m...

متن کامل

Spelling-based Phonics Instruction: It’s Effect on English Reading and Spelling in an EFL Context

Systematic phonics instruction in first language education has recently received considerable research attention due to its critical role in facilitating phonological awareness and processing skills. However, little is known about the effects of systematic phonics instruction on foreign language reading and spelling in an EFL context. This study examined the effects of spelling-based phonics in...

متن کامل

An Indoor Positioning System Based on Wi-Fi for Energy Management in Smart Buildings

To offer indoor services to occupants in the context of smart buildings, it is necessary to consider information concerning to the identity and location of the occupants. This paper proposes an indoor positioning system (IPS) based on Wi-Fi fingerprint and K-nearest neighbors (KNN) method. The positioning of a mobile device (MD) using Wi-Fi technology involves online and offline phases. In this...

متن کامل

Designing and implementing a Web-based real time routing service for crisis management (a case study for district 11 of Tehran)

Timing framework associated with catastrophes is one of the most important issues in crisis management. In such cases, being immediate has a considerable importance and web based real-time routing service as an important tool has a significant role in relief operations improvement. At this study, a web-based real time routing service based on open source technology has designed for 11th distric...

متن کامل

Modeling of the Relationships Between Spatio-Temporal Changes of Traffic Volume and Particulate Matter-2.5 Pollutant Concentration Based on Geographically Weighted Regression (GWR) and Inverse Distance Weighting (IDW) Model: A Case Study in Tehran M

Background and Aim: High concentrations of particulate matter-25 (PM2.5) have been the cause of the unhealthiest days in Tehran, Iran in recent years. This study was conducted with the aim of the spatio-temporal analysis of traffic volume and its relationship with PM2.5 pollutant concentrations in Tehran metropolis, Tehran during 2015-2018, using the Geographic Information System (GIS). Materi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010